194 research outputs found

    A Variable Metric Probabilistic k-Nearest-Neighbours Classifier

    Get PDF
    Copyright © 2004 Springer Verlag. The final publication is available at link.springer.com5th International Conference, Exeter, UK. August 25-27, 2004. ProceedingsBook title: Intelligent Data Engineering and Automated Learning – IDEAL 2004k-nearest neighbour (k-nn) model is a simple, popular classifier. Probabilistic k-nn is a more powerful variant in which the model is cast in a Bayesian framework using (reversible jump) Markov chain Monte Carlo methods to average out the uncertainy over the model parameters.The k-nn classifier depends crucially on the metric used to determine distances between data points. However, scalings between features, and indeed whether some subset of features is redundant, are seldom known a priori. Here we introduce a variable metric extension to the probabilistic k-nn classifier, which permits averaging over all rotations and scalings of the data. In addition, the method permits automatic rejection of irrelevant features. Examples are provided on synthetic data, illustrating how the method can deform feature space and select salient features, and also on real-world data

    Cardinality constrained portfolio optimisation

    Get PDF
    Copyright © 2004 Springer-Verlag Berlin Heidelberg. The final publication is available at link.springer.comBook title: Intelligent Data Engineering and Automated Learning – IDEAL 20045th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2004), Exeter, UK. August 25-27, 2004The traditional quadratic programming approach to portfolio optimisation is difficult to implement when there are cardinality constraints. Recent approaches to resolving this have used heuristic algorithms to search for points on the cardinality constrained frontier. However, these can be computationally expensive when the practitioner does not know a priori exactly how many assets they may desire in a portfolio, or what level of return/risk they wish to be exposed to without recourse to analysing the actual trade-off frontier.This study introduces a parallel solution to this problem. By extending techniques developed in the multi-objective evolutionary optimisation domain, a set of portfolios representing estimates of all possible cardinality constrained frontiers can be found in a single search process, for a range of portfolio sizes and constraints. Empirical results are provided on emerging markets and US asset data, and compared to unconstrained frontiers found by quadratic programming

    In search of lost introns

    Full text link
    Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after O(nL)O(nL) preprocessing time, subsequent evaluations take O(nL/logL)O(nL/\log L) time almost surely in the Yule-Harding random model of nn-taxon phylogenies, where LL is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

    Observability and nonlinear filtering

    Full text link
    This paper develops a connection between the asymptotic stability of nonlinear filters and a notion of observability. We consider a general class of hidden Markov models in continuous time with compact signal state space, and call such a model observable if no two initial measures of the signal process give rise to the same law of the observation process. We demonstrate that observability implies stability of the filter, i.e., the filtered estimates become insensitive to the initial measure at large times. For the special case where the signal is a finite-state Markov process and the observations are of the white noise type, a complete (necessary and sufficient) characterization of filter stability is obtained in terms of a slightly weaker detectability condition. In addition to observability, the role of controllability in filter stability is explored. Finally, the results are partially extended to non-compact signal state spaces

    Dependence of paracentric inversion rate on tract length

    Get PDF
    BACKGROUND: We develop a Bayesian method based on MCMC for estimating the relative rates of pericentric and paracentric inversions from marker data from two species. The method also allows estimation of the distribution of inversion tract lengths. RESULTS: We apply the method to data from Drosophila melanogaster and D. yakuba. We find that pericentric inversions occur at a much lower rate compared to paracentric inversions. The average paracentric inversion tract length is approx. 4.8 Mb with small inversions being more frequent than large inversions. If the two breakpoints defining a paracentric inversion tract are uniformly and independently distributed over chromosome arms there will be more short tract-length inversions than long; we find an even greater preponderance of short tract lengths than this would predict. Thus there appears to be a correlation between the positions of breakpoints which favors shorter tract lengths. CONCLUSION: The method developed in this paper provides the first statistical estimator for estimating the distribution of inversion tract lengths from marker data. Application of this method for a number of data sets may help elucidate the relationship between the length of an inversion and the chance that it will get accepted

    Global distribution of two fungal pathogens threatening endangered sea turtles

    Get PDF
    This work was supported by grants of Ministerio de Ciencia e Innovación, Spain (CGL2009-10032, CGL2012-32934). J.M.S.R was supported by PhD fellowship of the CSIC (JAEPre 0901804). The Natural Environment Research Council and the Biotechnology and Biological Sciences Research Council supported P.V.W. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Thanks Machalilla National Park in Ecuador, Pacuare Nature Reserve in Costa Rica, Foundations Natura 2000 in Cape Verde and Equilibrio Azul in Ecuador, Dr. Jesus Muñoz, Dr. Ian Bell, Dr. Juan Patiño for help and technical support during samplingPeer reviewedPublisher PD

    A framework for orthology assignment from gene rearrangement data

    Get PDF
    Abstract. Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive when comparing nuclear genomes. The main challenge is how to deal with gene families, specifically, how to identify orthologs. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. We approach that problem using gene rearrangement data, provide an optimization framework in which to phrase the problem, and present some preliminary theoretical results.

    Phylogenetic analysis of Croatian orf viruses isolated from sheep and goats

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The <it>Orf virus </it>(ORFV) is the prototype of the parapoxvirus genus and it primarily causes contagious ecthyma in goats, sheep, and other ruminants worldwide. In this paper, we described the sequence and phylogenetic analysis of the B2L gene of ORFV from two natural outbreaks: i) in autochthonous Croatian Cres-breed sheep and ii) on small family goat farm.</p> <p>Results</p> <p>Sequence and phylogenetic analyses of the ORFV B2L gene showed that the Cro-Cres-12446/09 and Cro-Goat-11727/10 were not clustered together. Cro-Cres-12446/09 shared the highest similarity with ORFV NZ2 from New Zealand, and Ena from Japan; Cro-Goat-11727/10 was closest to the HuB from China and Taiping and Hoping from Taiwan.</p> <p>Conclusion</p> <p>Distinct ORFV strains are circulating in Croatia. Although ORFV infections are found ubiquitously wherever sheep and goats are farmed in Croatia, this is the first information on genetic relatedness of any Croatian ORFV with other isolates around the world.</p

    Sampling solution traces for the problem of sorting permutations by signed reversals

    Get PDF
    International audienceBackgroundTraditional algorithms to solve the problem of sorting by signed reversals output just one optimal solution while the space of all optimal solutions can be huge. A so-called trace represents a group of solutions which share the same set of reversals that must be applied to sort the original permutation following a partial ordering. By using traces, we therefore can represent the set of optimal solutions in a more compact way. Algorithms for enumerating the complete set of traces of solutions were developed. However, due to their exponential complexity, their practical use is limited to small permutations. A partial enumeration of traces is a sampling of the complete set of traces and can be an alternative for the study of distinct evolutionary scenarios of big permutations. Ideally, the sampling should be done uniformly from the space of all optimal solutions. This is however conjectured to be ♯P-complete.ResultsWe propose and evaluate three algorithms for producing a sampling of the complete set of traces that instead can be shown in practice to preserve some of the characteristics of the space of all solutions. The first algorithm (RA) performs the construction of traces through a random selection of reversals on the list of optimal 1-sequences. The second algorithm (DFALT) consists in a slight modification of an algorithm that performs the complete enumeration of traces. Finally, the third algorithm (SWA) is based on a sliding window strategy to improve the enumeration of traces. All proposed algorithms were able to enumerate traces for permutations with up to 200 elements.ConclusionsWe analysed the distribution of the enumerated traces with respect to their height and average reversal length. Various works indicate that the reversal length can be an important aspect in genome rearrangements. The algorithms RA and SWA show a tendency to lose traces with high average reversal length. Such traces are however rare, and qualitatively our results show that, for testable-sized permutations, the algorithms DFALT and SWA produce distributions which approximate the reversal length distributions observed with a complete enumeration of the set of traces

    Probabilistic Phylogenetic Inference with Insertions and Deletions

    Get PDF
    A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new “concordance test” benchmark on real ribosomal RNA alignments, we show that the extended program dnamlε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm
    corecore